{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# $ R \\times C $ 列联表资料的 $ \\chi^2 $ 检验（2）\n",
    "\n",
    "## 案例\n",
    "\n",
    "探究不同地区人群血型分布总体构成比是否不同\n",
    "\n",
    "### 数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (3, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>continent</th><th>A</th><th>B</th><th>AB</th><th>O</th></tr><tr><td>str</td><td>i64</td><td>i64</td><td>i64</td><td>i64</td></tr></thead><tbody><tr><td>&quot;Asia&quot;</td><td>321</td><td>369</td><td>95</td><td>295</td></tr><tr><td>&quot;Europe&quot;</td><td>258</td><td>43</td><td>22</td><td>194</td></tr><tr><td>&quot;N.America&quot;</td><td>408</td><td>106</td><td>37</td><td>444</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (3, 5)\n",
       "┌───────────┬─────┬─────┬─────┬─────┐\n",
       "│ continent ┆ A   ┆ B   ┆ AB  ┆ O   │\n",
       "│ ---       ┆ --- ┆ --- ┆ --- ┆ --- │\n",
       "│ str       ┆ i64 ┆ i64 ┆ i64 ┆ i64 │\n",
       "╞═══════════╪═════╪═════╪═════╪═════╡\n",
       "│ Asia      ┆ 321 ┆ 369 ┆ 95  ┆ 295 │\n",
       "│ Europe    ┆ 258 ┆ 43  ┆ 22  ┆ 194 │\n",
       "│ N.America ┆ 408 ┆ 106 ┆ 37  ┆ 444 │\n",
       "└───────────┴─────┴─────┴─────┴─────┘"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import polars as pl\n",
    "\n",
    "df = pl.read_csv(\"B_09_7.csv\")\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 分析\n",
    "\n",
    "此案例为 $ 3 \\times 4 $ 联表，使用 $ R \\times C $ 列联表资料的 $ \\chi^2 $ 检验。\n",
    "\n",
    "### 假设\n",
    "\n",
    "$ H_0 $: 不同地区人群血型分布总体构成比相同  \n",
    "$ H_1 $: 不同地区人群血型分布总体构成比不同\n",
    "\n",
    "### 检验"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "chi square value = 297.37495990470273\n",
      "degree of freedom = 6\n",
      "p value = 2.9868146237744393e-61\n"
     ]
    }
   ],
   "source": [
    "from scipy.stats import chi2_contingency\n",
    "\n",
    "res = chi2_contingency(df.select(df.columns[1:]))\n",
    "\n",
    "print(f\"chi square value = {res.statistic}\")\n",
    "print(f\"degree of freedom = {res.dof}\")\n",
    "print(f\"p value = {res.pvalue}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "P < 0.05, 在 $ \\alpha = 0.05 $ 的检验水准下，拒绝 $ H_0 $，接受 $ H_1 $，可以认为不同地区人群血型分布总体构成比不同。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
